Voice activity detection in personal audio recordings using autocorrelogram compensation

نویسندگان

  • Keansub Lee
  • Daniel P. W. Ellis
چکیده

This paper presents a novel method for identifying regions of speech in the kinds of energetic and highly-variable noise present in ‘personal audio’ collected by body-worn continuous recorders. Motivated by psychoacoustic evidence that pitch is crucial in the perception and organization of sound, we use a noise-robust pitch detection algorithm to locate speech-like regions. To avoid false alarms resulting from background noise with strong periodic components (such as air-conditioning), we add a new channel selection scheme to suppress frequency subbands where the autocorrelation is more stationary than encountered in voiced speech. Quantitative evaluation shows that these harmonic noises are effectively removed by this compensation technique in the domain of autocorrelogram, and that detection performance is significantly better than existing algorithms for detecting the presence of speech in real-world personal audio recordings.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Algorithm for Voice Activity Detection Based on Wavelet Packets (RESEARCH NOTE)

Speech constitutes much of the communicated information; most other perceived audio signals do not carry nearly as much information. Indeed, much of the non-speech signals maybe classified as ‘noise’ in human communication. The process of separating conversational speech and noise is termed voice activity detection (VAD). This paper describes a new approach to VAD which is based on the Wavelet ...

متن کامل

A Multi-Modal Database from Home Entertainment

This paper presents a new database containing highdefinition audio and video recordings in a rather unconstrained video-conferencing-like environment. The database consists of recordings of people sitting around a table in two separate rooms communicating and playing online games with each other. Extensive annotation of head positions, voice activity and word transcription has been performed on...

متن کامل

The Ta2 Database - a Multi-modal Database from Home Entertainment

This paper presents a new database containing highdefinition audio and video recordings in a rather unconstrained video-conferencing-like environment. The database consists of recordings of people sitting around a table in two separate rooms communicating and playing online games with each other. Extensive annotation of head positions, voice activity and word transcription has been performed on...

متن کامل

New Algorithms for Wow and Flutter Detection and Compensation in Audio

New algorithms were developed for discriminating wow from natural musical effects, such as: periodicity detection by means of autocorrelation signal, algorithm employing AR model for power line hum frequency detection and algorithm for estimating pitch variation curve employing wow tracking based on recording bias detection in magnetic recordings. Moreover. non-uniform resampling routine was im...

متن کامل

Towards Voice and Query Data-based Non-invasive Screening for Laryngeal Disorders

Topic of the research is exploration and fusion of non-invasive measurements for an accurate detection of pathological larynx. Measurements for human subject encompass results of a specific survey and information extracted by openSMILE toolkit from several audio recordings of sustained phonation (vowel /a/). Clinical diagnosis, assigned by medical specialist, is a target attribute for binary cl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006